Overview
Brought to you by YData
Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 10000 |
| Missing cells | 8070 |
| Missing cells (%) | 4.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.6 MiB |
| Average record size in memory | 168.0 B |
Variable types
| Categorical | 7 |
|---|---|
| DateTime | 2 |
| Numeric | 10 |
| Boolean | 1 |
Airport_fee is highly overall correlated with fare_amount and 3 other fields | High correlation |
RatecodeID is highly overall correlated with improvement_surcharge | High correlation |
VendorID is highly overall correlated with extra | High correlation |
congestion_surcharge is highly overall correlated with improvement_surcharge and 2 other fields | High correlation |
extra is highly overall correlated with VendorID | High correlation |
fare_amount is highly overall correlated with Airport_fee and 2 other fields | High correlation |
improvement_surcharge is highly overall correlated with RatecodeID and 2 other fields | High correlation |
mta_tax is highly overall correlated with congestion_surcharge and 2 other fields | High correlation |
store_and_fwd_flag is highly overall correlated with trip_distance | High correlation |
tip_amount is highly overall correlated with total_amount | High correlation |
tolls_amount is highly overall correlated with Airport_fee | High correlation |
total_amount is highly overall correlated with Airport_fee and 4 other fields | High correlation |
trip_distance is highly overall correlated with Airport_fee and 4 other fields | High correlation |
VendorID is highly imbalanced (51.9%) | Imbalance |
store_and_fwd_flag is highly imbalanced (97.6%) | Imbalance |
mta_tax is highly imbalanced (87.9%) | Imbalance |
improvement_surcharge is highly imbalanced (86.9%) | Imbalance |
congestion_surcharge is highly imbalanced (68.2%) | Imbalance |
Airport_fee is highly imbalanced (73.3%) | Imbalance |
passenger_count has 1614 (16.1%) missing values | Missing |
RatecodeID has 1614 (16.1%) missing values | Missing |
store_and_fwd_flag has 1614 (16.1%) missing values | Missing |
congestion_surcharge has 1614 (16.1%) missing values | Missing |
Airport_fee has 1614 (16.1%) missing values | Missing |
trip_distance is highly skewed (γ1 = 99.99766163) | Skewed |
trip_distance has 243 (2.4%) zeros | Zeros |
extra has 5071 (50.7%) zeros | Zeros |
tip_amount has 3268 (32.7%) zeros | Zeros |
tolls_amount has 9374 (93.7%) zeros | Zeros |
Reproduction
| Analysis started | 2025-06-04 06:58:51.563815 |
|---|---|
| Analysis finished | 2025-06-04 06:59:47.909420 |
| Duration | 56.35 seconds |
| Software version | ydata-profiling vv4.16.1 |
| Download configuration | config.json |
Variables
VendorID
Categorical
High correlation  Imbalance 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 156.2 KiB |
| 2 | |
|---|---|
| 1 | |
| 7 | 4 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 2 |
| 4th row | 2 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 2 | 7814 | |
| 1 | 2182 | 21.8% |
| 7 | 4 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2 | 7814 | |
| 1 | 2182 | 21.8% |
| 7 | 4 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 7814 | |
| 1 | 2182 | 21.8% |
| 7 | 4 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 10000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 7814 | |
| 1 | 2182 | 21.8% |
| 7 | 4 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 10000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 7814 | |
| 1 | 2182 | 21.8% |
| 7 | 4 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 10000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 7814 | |
| 1 | 2182 | 21.8% |
| 7 | 4 | < 0.1% |
| Distinct | 9979 |
|---|---|
| Distinct (%) | 99.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 156.2 KiB |
| Minimum | 2025-01-01 00:02:23 |
|---|---|
| Maximum | 2025-01-31 23:59:50 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
| Distinct | 9979 |
|---|---|
| Distinct (%) | 99.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 156.2 KiB |
| Minimum | 2025-01-01 00:13:19 |
|---|---|
| Maximum | 2025-02-01 00:24:05 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
passenger_count
Real number (ℝ)
Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 1614 |
| Missing (%) | 16.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.3037205 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 77 |
| Zeros (%) | 0.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 156.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.76798056 |
|---|---|
| Coefficient of variation (CV) | 0.58906842 |
| Kurtosis | 10.786931 |
| Mean | 1.3037205 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.0059831 |
| Sum | 10933 |
| Variance | 0.58979415 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 6636 | |
| 2 | 1116 | 11.2% |
| 3 | 290 | 2.9% |
| 4 | 177 | 1.8% |
| 0 | 77 | 0.8% |
| 5 | 53 | 0.5% |
| 6 | 37 | 0.4% |
| (Missing) | 1614 | 16.1% |
| Value | Count | Frequency (%) |
| 0 | 77 | 0.8% |
| 1 | 6636 | |
| 2 | 1116 | 11.2% |
| 3 | 290 | 2.9% |
| 4 | 177 | 1.8% |
| 5 | 53 | 0.5% |
| 6 | 37 | 0.4% |
| Value | Count | Frequency (%) |
| 6 | 37 | 0.4% |
| 5 | 53 | 0.5% |
| 4 | 177 | 1.8% |
| 3 | 290 | 2.9% |
| 2 | 1116 | 11.2% |
| 1 | 6636 | |
| 0 | 77 | 0.8% |
trip_distance
Real number (ℝ)
High correlation  Skewed  Zeros 
| Distinct | 1338 |
|---|---|
| Distinct (%) | 13.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.537008 |
| Minimum | 0 |
|---|---|
| Maximum | 104448.07 |
| Zeros | 243 |
| Zeros (%) | 2.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 156.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.4 |
| Q1 | 0.98 |
| median | 1.66 |
| Q3 | 3.09 |
| 95-th percentile | 12 |
| Maximum | 104448.07 |
| Range | 104448.07 |
| Interquartile range (IQR) | 2.11 |
Descriptive statistics
| Standard deviation | 1044.4579 |
|---|---|
| Coefficient of variation (CV) | 77.155743 |
| Kurtosis | 9999.6882 |
| Mean | 13.537008 |
| Median Absolute Deviation (MAD) | 0.85 |
| Skewness | 99.997662 |
| Sum | 135370.08 |
| Variance | 1090892.3 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 243 | 2.4% |
| 1.2 | 134 | 1.3% |
| 1 | 132 | 1.3% |
| 1.1 | 131 | 1.3% |
| 0.6 | 122 | 1.2% |
| 0.8 | 122 | 1.2% |
| 0.7 | 117 | 1.2% |
| 1.3 | 115 | 1.1% |
| 0.9 | 112 | 1.1% |
| 1.5 | 107 | 1.1% |
| Other values (1328) | 8665 |
| Value | Count | Frequency (%) |
| 0 | 243 | |
| 0.01 | 36 | 0.4% |
| 0.02 | 9 | 0.1% |
| 0.03 | 3 | < 0.1% |
| 0.04 | 2 | < 0.1% |
| 0.05 | 1 | < 0.1% |
| 0.06 | 1 | < 0.1% |
| 0.07 | 2 | < 0.1% |
| 0.08 | 2 | < 0.1% |
| 0.09 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 104448.07 | 1 | |
| 53 | 1 | |
| 36.78 | 1 | |
| 36.26 | 1 | |
| 35.4 | 1 | |
| 35.27 | 1 | |
| 35.22 | 1 | |
| 34.74 | 1 | |
| 32.34 | 1 | |
| 32.3 | 1 |
RatecodeID
Real number (ℝ)
High correlation  Missing 
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 1614 |
| Missing (%) | 16.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.5149058 |
| Minimum | 1 |
|---|---|
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 156.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 99 |
| Range | 98 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 11.780707 |
|---|---|
| Coefficient of variation (CV) | 4.6843531 |
| Kurtosis | 63.04852 |
| Mean | 2.5149058 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 8.0587705 |
| Sum | 21090 |
| Variance | 138.78505 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 7889 | |
| 2 | 262 | 2.6% |
| 99 | 123 | 1.2% |
| 5 | 68 | 0.7% |
| 4 | 28 | 0.3% |
| 3 | 16 | 0.2% |
| (Missing) | 1614 | 16.1% |
| Value | Count | Frequency (%) |
| 1 | 7889 | |
| 2 | 262 | 2.6% |
| 3 | 16 | 0.2% |
| 4 | 28 | 0.3% |
| 5 | 68 | 0.7% |
| 99 | 123 | 1.2% |
| Value | Count | Frequency (%) |
| 99 | 123 | 1.2% |
| 5 | 68 | 0.7% |
| 4 | 28 | 0.3% |
| 3 | 16 | 0.2% |
| 2 | 262 | 2.6% |
| 1 | 7889 |
store_and_fwd_flag
Boolean
High correlation  Imbalance  Missing 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1614 |
| Missing (%) | 16.1% |
| Memory size | 97.7 KiB |
| False | |
|---|---|
| True | 20 |
| (Missing) |
| Value | Count | Frequency (%) |
| False | 8366 | |
| True | 20 | 0.2% |
| (Missing) | 1614 | 16.1% |
PULocationID
Real number (ℝ)
| Distinct | 178 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 166.423 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 156.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 48 |
| Q1 | 132 |
| median | 162 |
| Q3 | 234 |
| 95-th percentile | 249 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 102 |
Descriptive statistics
| Standard deviation | 64.524239 |
|---|---|
| Coefficient of variation (CV) | 0.38771227 |
| Kurtosis | -0.84403529 |
| Mean | 166.423 |
| Median Absolute Deviation (MAD) | 64 |
| Skewness | -0.29412145 |
| Sum | 1664230 |
| Variance | 4163.3774 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 161 | 508 | 5.1% |
| 237 | 473 | 4.7% |
| 236 | 451 | 4.5% |
| 132 | 427 | 4.3% |
| 230 | 368 | 3.7% |
| 162 | 343 | 3.4% |
| 186 | 325 | 3.2% |
| 234 | 293 | 2.9% |
| 142 | 286 | 2.9% |
| 170 | 282 | 2.8% |
| Other values (168) | 6244 |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 4 | 24 | 0.2% |
| 6 | 1 | < 0.1% |
| 7 | 8 | 0.1% |
| 10 | 5 | 0.1% |
| 12 | 1 | < 0.1% |
| 13 | 65 | |
| 14 | 1 | < 0.1% |
| 17 | 3 | < 0.1% |
| 19 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 265 | 2 | < 0.1% |
| 264 | 24 | 0.2% |
| 263 | 216 | |
| 262 | 163 | |
| 261 | 57 | 0.6% |
| 260 | 2 | < 0.1% |
| 258 | 3 | < 0.1% |
| 257 | 2 | < 0.1% |
| 256 | 4 | < 0.1% |
| 255 | 7 | 0.1% |
DOLocationID
Real number (ℝ)
| Distinct | 207 |
|---|---|
| Distinct (%) | 2.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 163.579 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 156.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 43 |
| Q1 | 113 |
| median | 162 |
| Q3 | 234 |
| 95-th percentile | 257 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 121 |
Descriptive statistics
| Standard deviation | 69.584249 |
|---|---|
| Coefficient of variation (CV) | 0.4253862 |
| Kurtosis | -0.94946724 |
| Mean | 163.579 |
| Median Absolute Deviation (MAD) | 69 |
| Skewness | -0.34845111 |
| Sum | 1635790 |
| Variance | 4841.9678 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 236 | 451 | 4.5% |
| 237 | 433 | 4.3% |
| 161 | 392 | 3.9% |
| 170 | 298 | 3.0% |
| 230 | 293 | 2.9% |
| 239 | 283 | 2.8% |
| 186 | 269 | 2.7% |
| 141 | 268 | 2.7% |
| 48 | 263 | 2.6% |
| 142 | 261 | 2.6% |
| Other values (197) | 6789 |
| Value | Count | Frequency (%) |
| 1 | 16 | 0.2% |
| 3 | 2 | < 0.1% |
| 4 | 52 | |
| 7 | 17 | 0.2% |
| 10 | 8 | 0.1% |
| 11 | 3 | < 0.1% |
| 12 | 3 | < 0.1% |
| 13 | 62 | |
| 14 | 9 | 0.1% |
| 15 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 265 | 38 | 0.4% |
| 264 | 36 | 0.4% |
| 263 | 195 | |
| 262 | 168 | |
| 261 | 50 | 0.5% |
| 260 | 7 | 0.1% |
| 259 | 1 | < 0.1% |
| 258 | 3 | < 0.1% |
| 257 | 6 | 0.1% |
| 256 | 23 | 0.2% |
payment_type
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 156.2 KiB |
| 1 | |
|---|---|
| 0 | |
| 2 | |
| 4 | 206 |
| 3 | 63 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 4 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 6957 | |
| 0 | 1614 | 16.1% |
| 2 | 1160 | 11.6% |
| 4 | 206 | 2.1% |
| 3 | 63 | 0.6% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 6957 | |
| 0 | 1614 | 16.1% |
| 2 | 1160 | 11.6% |
| 4 | 206 | 2.1% |
| 3 | 63 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 6957 | |
| 0 | 1614 | 16.1% |
| 2 | 1160 | 11.6% |
| 4 | 206 | 2.1% |
| 3 | 63 | 0.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 10000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 6957 | |
| 0 | 1614 | 16.1% |
| 2 | 1160 | 11.6% |
| 4 | 206 | 2.1% |
| 3 | 63 | 0.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 10000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 6957 | |
| 0 | 1614 | 16.1% |
| 2 | 1160 | 11.6% |
| 4 | 206 | 2.1% |
| 3 | 63 | 0.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 10000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 6957 | |
| 0 | 1614 | 16.1% |
| 2 | 1160 | 11.6% |
| 4 | 206 | 2.1% |
| 3 | 63 | 0.6% |
fare_amount
Real number (ℝ)
High correlation 
| Distinct | 1300 |
|---|---|
| Distinct (%) | 13.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.729521 |
| Minimum | -232.6 |
|---|---|
| Maximum | 250 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 416 |
| Negative (%) | 4.2% |
| Memory size | 156.2 KiB |
Quantile statistics
| Minimum | -232.6 |
|---|---|
| 5-th percentile | 4.4 |
| Q1 | 8.6 |
| median | 12.1 |
| Q3 | 19.1625 |
| 95-th percentile | 50.6 |
| Maximum | 250 |
| Range | 482.6 |
| Interquartile range (IQR) | 10.5625 |
Descriptive statistics
| Standard deviation | 17.556368 |
|---|---|
| Coefficient of variation (CV) | 1.0494244 |
| Kurtosis | 19.518952 |
| Mean | 16.729521 |
| Median Absolute Deviation (MAD) | 4.9 |
| Skewness | 2.1104963 |
| Sum | 167295.21 |
| Variance | 308.22606 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 9.3 | 448 | 4.5% |
| 7.9 | 434 | 4.3% |
| 10 | 414 | 4.1% |
| 7.2 | 401 | 4.0% |
| 8.6 | 387 | 3.9% |
| 10.7 | 374 | 3.7% |
| 6.5 | 373 | 3.7% |
| 12.1 | 364 | 3.6% |
| 11.4 | 363 | 3.6% |
| 5.8 | 314 | 3.1% |
| Other values (1290) | 6128 |
| Value | Count | Frequency (%) |
| -232.6 | 1 | < 0.1% |
| -89.9 | 1 | < 0.1% |
| -81.8 | 1 | < 0.1% |
| -71.6 | 1 | < 0.1% |
| -70 | 11 | |
| -66 | 1 | < 0.1% |
| -62 | 1 | < 0.1% |
| -61.8 | 1 | < 0.1% |
| -58.3 | 1 | < 0.1% |
| -56.2 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 250 | 1 | |
| 202.5 | 1 | |
| 200 | 1 | |
| 193.4 | 1 | |
| 184.3 | 1 | |
| 162.6 | 1 | |
| 158.4 | 2 | |
| 150 | 1 | |
| 146.5 | 2 | |
| 145.1 | 1 |
extra
Real number (ℝ)
High correlation  Zeros 
| Distinct | 31 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.332225 |
| Minimum | -7.5 |
|---|---|
| Maximum | 12.5 |
| Zeros | 5071 |
| Zeros (%) | 50.7% |
| Negative | 67 |
| Negative (%) | 0.7% |
| Memory size | 156.2 KiB |
Quantile statistics
| Minimum | -7.5 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 2.5 |
| 95-th percentile | 5 |
| Maximum | 12.5 |
| Range | 20 |
| Interquartile range (IQR) | 2.5 |
Descriptive statistics
| Standard deviation | 1.8641433 |
|---|---|
| Coefficient of variation (CV) | 1.3992706 |
| Kurtosis | 3.0001168 |
| Mean | 1.332225 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.518012 |
| Sum | 13322.25 |
| Variance | 3.4750303 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 5071 | |
| 1 | 1550 | 15.5% |
| 2.5 | 1530 | 15.3% |
| 3.25 | 593 | 5.9% |
| 4.25 | 324 | 3.2% |
| 5 | 308 | 3.1% |
| 5.75 | 221 | 2.2% |
| 3.5 | 109 | 1.1% |
| 6 | 64 | 0.6% |
| 7.5 | 48 | 0.5% |
| Other values (21) | 182 | 1.8% |
| Value | Count | Frequency (%) |
| -7.5 | 1 | < 0.1% |
| -6 | 1 | < 0.1% |
| -5 | 4 | < 0.1% |
| -2.5 | 26 | 0.3% |
| -1 | 33 | 0.3% |
| -0.75 | 2 | < 0.1% |
| 0 | 5071 | |
| 0.75 | 7 | 0.1% |
| 1 | 1550 | 15.5% |
| 1.75 | 9 | 0.1% |
| Value | Count | Frequency (%) |
| 12.5 | 6 | 0.1% |
| 11.75 | 1 | < 0.1% |
| 11 | 7 | 0.1% |
| 10.75 | 2 | < 0.1% |
| 10.25 | 2 | < 0.1% |
| 10 | 19 | |
| 9.25 | 13 | |
| 8.5 | 1 | < 0.1% |
| 8.25 | 12 | |
| 7.75 | 9 |
mta_tax
Categorical
High correlation  Imbalance 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 156.2 KiB |
| 0.5 | |
|---|---|
| -0.5 | 155 |
| 0.0 | 93 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0155 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.5 |
|---|---|
| 2nd row | 0.5 |
| 3rd row | -0.5 |
| 4th row | 0.5 |
| 5th row | 0.5 |
Common Values
| Value | Count | Frequency (%) |
| 0.5 | 9752 | |
| -0.5 | 155 | 1.6% |
| 0.0 | 93 | 0.9% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.5 | 9907 | |
| 0.0 | 93 | 0.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 10093 | |
| . | 10000 | |
| 5 | 9907 | |
| - | 155 | 0.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 20000 | |
| Other Punctuation | 10000 | |
| Dash Punctuation | 155 | 0.5% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 10093 | |
| 5 | 9907 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 10000 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 155 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 30155 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 10093 | |
| . | 10000 | |
| 5 | 9907 | |
| - | 155 | 0.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 30155 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 10093 | |
| . | 10000 | |
| 5 | 9907 | |
| - | 155 | 0.5% |
tip_amount
Real number (ℝ)
High correlation  Zeros 
| Distinct | 915 |
|---|---|
| Distinct (%) | 9.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.893703 |
| Minimum | 0 |
|---|---|
| Maximum | 50.2 |
| Zeros | 3268 |
| Zeros (%) | 32.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 156.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2.39 |
| Q3 | 3.9 |
| 95-th percentile | 10 |
| Maximum | 50.2 |
| Range | 50.2 |
| Interquartile range (IQR) | 3.9 |
Descriptive statistics
| Standard deviation | 3.5922181 |
|---|---|
| Coefficient of variation (CV) | 1.2413914 |
| Kurtosis | 15.06332 |
| Mean | 2.893703 |
| Median Absolute Deviation (MAD) | 2.26 |
| Skewness | 2.9142675 |
| Sum | 28937.03 |
| Variance | 12.904031 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 3268 | |
| 2 | 453 | 4.5% |
| 1 | 362 | 3.6% |
| 3 | 177 | 1.8% |
| 5 | 97 | 1.0% |
| 2.95 | 85 | 0.9% |
| 1.5 | 77 | 0.8% |
| 4 | 72 | 0.7% |
| 2.8 | 64 | 0.6% |
| 3.37 | 58 | 0.6% |
| Other values (905) | 5287 |
| Value | Count | Frequency (%) |
| 0 | 3268 | |
| 0.01 | 5 | 0.1% |
| 0.02 | 3 | < 0.1% |
| 0.03 | 1 | < 0.1% |
| 0.05 | 2 | < 0.1% |
| 0.08 | 2 | < 0.1% |
| 0.1 | 3 | < 0.1% |
| 0.11 | 1 | < 0.1% |
| 0.45 | 1 | < 0.1% |
| 0.49 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 50.2 | 1 | |
| 43.45 | 1 | |
| 39.3 | 1 | |
| 37.6 | 1 | |
| 35.09 | 1 | |
| 33.37 | 1 | |
| 33 | 1 | |
| 31.68 | 1 | |
| 30.36 | 1 | |
| 30.04 | 1 |
tolls_amount
Real number (ℝ)
High correlation  Zeros 
| Distinct | 36 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.445742 |
| Minimum | -27.88 |
|---|---|
| Maximum | 30.94 |
| Zeros | 9374 |
| Zeros (%) | 93.7% |
| Negative | 13 |
| Negative (%) | 0.1% |
| Memory size | 156.2 KiB |
Quantile statistics
| Minimum | -27.88 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 6.94 |
| Maximum | 30.94 |
| Range | 58.82 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.9530841 |
|---|---|
| Coefficient of variation (CV) | 4.381647 |
| Kurtosis | 39.374332 |
| Mean | 0.445742 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.4758295 |
| Sum | 4457.42 |
| Variance | 3.8145375 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 9374 | |
| 6.94 | 560 | 5.6% |
| -6.94 | 11 | 0.1% |
| 3.18 | 7 | 0.1% |
| 16.06 | 4 | < 0.1% |
| 11.19 | 4 | < 0.1% |
| 13.88 | 4 | < 0.1% |
| 15.38 | 4 | < 0.1% |
| 14.06 | 3 | < 0.1% |
| 23 | 2 | < 0.1% |
| Other values (26) | 27 | 0.3% |
| Value | Count | Frequency (%) |
| -27.88 | 1 | < 0.1% |
| -14.06 | 1 | < 0.1% |
| -6.94 | 11 | 0.1% |
| 0 | 9374 | |
| 2 | 1 | < 0.1% |
| 2.6 | 2 | < 0.1% |
| 3.18 | 7 | 0.1% |
| 4 | 1 | < 0.1% |
| 5.2 | 1 | < 0.1% |
| 6.94 | 560 | 5.6% |
| Value | Count | Frequency (%) |
| 30.94 | 1 | |
| 27.94 | 1 | |
| 24.38 | 1 | |
| 24.06 | 1 | |
| 23 | 2 | |
| 22.56 | 1 | |
| 20.94 | 1 | |
| 20.88 | 1 | |
| 20.32 | 1 | |
| 20 | 1 |
improvement_surcharge
Categorical
High correlation  Imbalance 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 156.2 KiB |
| 1.0 | |
|---|---|
| -1.0 | 160 |
| 0.0 | 114 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.016 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | -1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 9726 | |
| -1.0 | 160 | 1.6% |
| 0.0 | 114 | 1.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1.0 | 9886 | |
| 0.0 | 114 | 1.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 10114 | |
| . | 10000 | |
| 1 | 9886 | |
| - | 160 | 0.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 20000 | |
| Other Punctuation | 10000 | |
| Dash Punctuation | 160 | 0.5% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 10114 | |
| 1 | 9886 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 10000 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 160 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 30160 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 10114 | |
| . | 10000 | |
| 1 | 9886 | |
| - | 160 | 0.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 30160 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 10114 | |
| . | 10000 | |
| 1 | 9886 | |
| - | 160 | 0.5% |
total_amount
Real number (ℝ)
High correlation 
| Distinct | 2827 |
|---|---|
| Distinct (%) | 28.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25.209691 |
| Minimum | -236.85 |
|---|---|
| Maximum | 301.2 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 166 |
| Negative (%) | 1.7% |
| Memory size | 156.2 KiB |
Quantile statistics
| Minimum | -236.85 |
|---|---|
| 5-th percentile | 9.097 |
| Q1 | 15.18 |
| median | 19.945 |
| Q3 | 27.69 |
| 95-th percentile | 72.42 |
| Maximum | 301.2 |
| Range | 538.05 |
| Interquartile range (IQR) | 12.51 |
Descriptive statistics
| Standard deviation | 21.756773 |
|---|---|
| Coefficient of variation (CV) | 0.86303211 |
| Kurtosis | 15.879141 |
| Mean | 25.209691 |
| Median Absolute Deviation (MAD) | 5.605 |
| Skewness | 2.1646193 |
| Sum | 252096.91 |
| Variance | 473.35717 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 17.7 | 79 | 0.8% |
| 20.22 | 58 | 0.6% |
| 16.02 | 56 | 0.6% |
| 21.9 | 56 | 0.6% |
| 16.86 | 53 | 0.5% |
| 21.06 | 52 | 0.5% |
| 16.8 | 49 | 0.5% |
| 13.5 | 48 | 0.5% |
| 14.7 | 45 | 0.4% |
| 18.06 | 45 | 0.4% |
| Other values (2817) | 9459 |
| Value | Count | Frequency (%) |
| -236.85 | 1 | < 0.1% |
| -105.98 | 1 | < 0.1% |
| -90.9 | 1 | < 0.1% |
| -85.55 | 1 | < 0.1% |
| -83.44 | 2 | |
| -82.69 | 3 | |
| -80.94 | 1 | < 0.1% |
| -79 | 1 | < 0.1% |
| -77.81 | 1 | < 0.1% |
| -77.69 | 2 |
| Value | Count | Frequency (%) |
| 301.2 | 1 | |
| 241.5 | 1 | |
| 237.53 | 1 | |
| 217.24 | 1 | |
| 214.71 | 1 | |
| 212.28 | 1 | |
| 210 | 1 | |
| 201.96 | 1 | |
| 190.06 | 1 | |
| 188.09 | 1 |
congestion_surcharge
Categorical
High correlation  Imbalance  Missing 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1614 |
| Missing (%) | 16.1% |
| Memory size | 156.2 KiB |
| 2.5 | |
|---|---|
| 0.0 | 649 |
| -2.5 | 127 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0151443 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 2.5 |
| 3rd row | -2.5 |
| 4th row | 2.5 |
| 5th row | 2.5 |
Common Values
| Value | Count | Frequency (%) |
| 2.5 | 7610 | |
| 0.0 | 649 | 6.5% |
| -2.5 | 127 | 1.3% |
| (Missing) | 1614 | 16.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2.5 | 7737 | |
| 0.0 | 649 | 7.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 8386 | |
| 2 | 7737 | |
| 5 | 7737 | |
| 0 | 1298 | 5.1% |
| - | 127 | 0.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 16772 | |
| Other Punctuation | 8386 | |
| Dash Punctuation | 127 | 0.5% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 7737 | |
| 5 | 7737 | |
| 0 | 1298 | 7.7% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 8386 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 127 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 25285 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 8386 | |
| 2 | 7737 | |
| 5 | 7737 | |
| 0 | 1298 | 5.1% |
| - | 127 | 0.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 25285 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 8386 | |
| 2 | 7737 | |
| 5 | 7737 | |
| 0 | 1298 | 5.1% |
| - | 127 | 0.5% |
Airport_fee
Categorical
High correlation  Imbalance  Missing 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1614 |
| Missing (%) | 16.1% |
| Memory size | 156.2 KiB |
| 0.0 | |
|---|---|
| 1.75 | 635 |
| -1.75 | 33 |
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.0835917 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.75 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 7718 | |
| 1.75 | 635 | 6.3% |
| -1.75 | 33 | 0.3% |
| (Missing) | 1614 | 16.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.0 | 7718 | |
| 1.75 | 668 | 8.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 15436 | |
| . | 8386 | |
| 1 | 668 | 2.6% |
| 7 | 668 | 2.6% |
| 5 | 668 | 2.6% |
| - | 33 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 17440 | |
| Other Punctuation | 8386 | |
| Dash Punctuation | 33 | 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 15436 | |
| 1 | 668 | 3.8% |
| 7 | 668 | 3.8% |
| 5 | 668 | 3.8% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 8386 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 33 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 25859 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 15436 | |
| . | 8386 | |
| 1 | 668 | 2.6% |
| 7 | 668 | 2.6% |
| 5 | 668 | 2.6% |
| - | 33 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 25859 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 15436 | |
| . | 8386 | |
| 1 | 668 | 2.6% |
| 7 | 668 | 2.6% |
| 5 | 668 | 2.6% |
| - | 33 | 0.1% |
cbd_congestion_fee
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 156.2 KiB |
| 0.75 | |
|---|---|
| 0.0 | |
| -0.75 | 17 |
Length
| Max length | 5 |
|---|---|
| Median length | 4 |
| Mean length | 3.6555 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.75 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.75 |
Common Values
| Value | Count | Frequency (%) |
| 0.75 | 6521 | |
| 0.0 | 3462 | |
| -0.75 | 17 | 0.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.75 | 6538 | |
| 0.0 | 3462 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 13462 | |
| . | 10000 | |
| 7 | 6538 | |
| 5 | 6538 | |
| - | 17 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 26538 | |
| Other Punctuation | 10000 | 27.4% |
| Dash Punctuation | 17 | < 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 13462 | |
| 7 | 6538 | |
| 5 | 6538 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 10000 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 17 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 36555 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 13462 | |
| . | 10000 | |
| 7 | 6538 | |
| 5 | 6538 | |
| - | 17 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 36555 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 13462 | |
| . | 10000 | |
| 7 | 6538 | |
| 5 | 6538 | |
| - | 17 | < 0.1% |
Interactions
Correlations
| Airport_fee | DOLocationID | PULocationID | RatecodeID | VendorID | cbd_congestion_fee | congestion_surcharge | extra | fare_amount | improvement_surcharge | mta_tax | passenger_count | payment_type | store_and_fwd_flag | tip_amount | tolls_amount | total_amount | trip_distance | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Airport_fee | 1.000 | 0.076 | 0.366 | 0.032 | 0.035 | 0.133 | 0.331 | 0.429 | 0.605 | 0.320 | 0.310 | 0.028 | 0.202 | 0.000 | 0.374 | 0.522 | 0.640 | 1.000 |
| DOLocationID | 0.076 | 1.000 | 0.086 | -0.047 | 0.013 | 0.142 | 0.134 | 0.012 | -0.087 | 0.050 | 0.062 | -0.006 | 0.059 | 0.000 | 0.039 | -0.043 | -0.070 | -0.095 |
| PULocationID | 0.366 | 0.086 | 1.000 | -0.132 | 0.017 | 0.174 | 0.206 | -0.013 | -0.125 | 0.073 | 0.044 | -0.010 | 0.059 | 0.000 | -0.005 | -0.118 | -0.116 | -0.145 |
| RatecodeID | 0.032 | -0.047 | -0.132 | 1.000 | 0.220 | 0.161 | 0.417 | -0.135 | 0.347 | 0.941 | 0.014 | 0.050 | 0.052 | 0.000 | 0.044 | 0.471 | 0.321 | 0.286 |
| VendorID | 0.035 | 0.013 | 0.017 | 0.220 | 1.000 | 0.025 | 0.065 | 0.509 | 0.027 | 0.150 | 0.045 | 0.138 | 0.063 | 0.070 | 0.019 | 0.000 | 0.033 | 0.000 |
| cbd_congestion_fee | 0.133 | 0.142 | 0.174 | 0.161 | 0.025 | 1.000 | 0.384 | 0.202 | 0.140 | 0.273 | 0.258 | 0.027 | 0.139 | 0.028 | 0.050 | 0.109 | 0.131 | 0.000 |
| congestion_surcharge | 0.331 | 0.134 | 0.206 | 0.417 | 0.065 | 0.384 | 1.000 | 0.312 | 0.294 | 0.693 | 0.657 | 0.028 | 0.369 | 0.002 | 0.113 | 0.241 | 0.392 | 1.000 |
| extra | 0.429 | 0.012 | -0.013 | -0.135 | 0.509 | 0.202 | 0.312 | 1.000 | 0.059 | 0.311 | 0.305 | -0.075 | 0.244 | 0.103 | 0.299 | 0.129 | 0.189 | 0.048 |
| fare_amount | 0.605 | -0.087 | -0.125 | 0.347 | 0.027 | 0.140 | 0.294 | 0.059 | 1.000 | 0.341 | 0.406 | 0.033 | 0.164 | 0.000 | 0.350 | 0.393 | 0.957 | 0.798 |
| improvement_surcharge | 0.320 | 0.050 | 0.073 | 0.941 | 0.150 | 0.273 | 0.693 | 0.311 | 0.341 | 1.000 | 0.697 | 0.044 | 0.411 | 0.000 | 0.037 | 0.218 | 0.456 | 0.000 |
| mta_tax | 0.310 | 0.062 | 0.044 | 0.014 | 0.045 | 0.258 | 0.657 | 0.305 | 0.406 | 0.697 | 1.000 | 0.058 | 0.406 | 0.000 | 0.167 | 0.359 | 0.506 | 0.000 |
| passenger_count | 0.028 | -0.006 | -0.010 | 0.050 | 0.138 | 0.027 | 0.028 | -0.075 | 0.033 | 0.044 | 0.058 | 1.000 | 0.031 | 0.000 | 0.003 | 0.043 | 0.030 | 0.031 |
| payment_type | 0.202 | 0.059 | 0.059 | 0.052 | 0.063 | 0.139 | 0.369 | 0.244 | 0.164 | 0.411 | 0.406 | 0.031 | 1.000 | 0.019 | 0.117 | 0.102 | 0.196 | 0.011 |
| store_and_fwd_flag | 0.000 | 0.000 | 0.000 | 0.000 | 0.070 | 0.028 | 0.002 | 0.103 | 0.000 | 0.000 | 0.000 | 0.000 | 0.019 | 1.000 | 0.000 | 0.000 | 0.000 | 1.000 |
| tip_amount | 0.374 | 0.039 | -0.005 | 0.044 | 0.019 | 0.050 | 0.113 | 0.299 | 0.350 | 0.037 | 0.167 | 0.003 | 0.117 | 0.000 | 1.000 | 0.220 | 0.525 | 0.279 |
| tolls_amount | 0.522 | -0.043 | -0.118 | 0.471 | 0.000 | 0.109 | 0.241 | 0.129 | 0.393 | 0.218 | 0.359 | 0.043 | 0.102 | 0.000 | 0.220 | 1.000 | 0.402 | 0.368 |
| total_amount | 0.640 | -0.070 | -0.116 | 0.321 | 0.033 | 0.131 | 0.392 | 0.189 | 0.957 | 0.456 | 0.506 | 0.030 | 0.196 | 0.000 | 0.525 | 0.402 | 1.000 | 0.769 |
| trip_distance | 1.000 | -0.095 | -0.145 | 0.286 | 0.000 | 0.000 | 1.000 | 0.048 | 0.798 | 0.000 | 0.000 | 0.031 | 0.011 | 1.000 | 0.279 | 0.368 | 0.769 | 1.000 |
Missing values
Sample
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | Airport_fee | cbd_congestion_fee | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 561277 | 1 | 2025-01-07 19:24:13 | 2025-01-07 20:11:51 | 1.0 | 18.80 | 1.0 | N | 132 | 26 | 1 | 72.3 | 4.25 | 0.5 | 0.00 | 0.0 | 1.0 | 78.05 | 0.0 | 1.75 | 0.00 |
| 1042820 | 1 | 2025-01-12 17:02:55 | 2025-01-12 17:17:34 | 1.0 | 1.90 | 1.0 | N | 107 | 246 | 1 | 13.5 | 3.25 | 0.5 | 3.65 | 0.0 | 1.0 | 21.90 | 2.5 | 0.00 | 0.75 |
| 26146 | 2 | 2025-01-01 09:55:16 | 2025-01-01 09:59:51 | 1.0 | 1.17 | 1.0 | N | 161 | 234 | 4 | -7.9 | 0.00 | -0.5 | 0.00 | 0.0 | -1.0 | -11.90 | -2.5 | 0.00 | 0.00 |
| 2345411 | 2 | 2025-01-25 22:30:52 | 2025-01-25 22:36:14 | 1.0 | 0.77 | 1.0 | N | 263 | 236 | 1 | 7.2 | 1.00 | 0.5 | 2.44 | 0.0 | 1.0 | 14.64 | 2.5 | 0.00 | 0.00 |
| 2388539 | 1 | 2025-01-26 12:48:51 | 2025-01-26 13:09:40 | 1.0 | 2.80 | 1.0 | N | 229 | 114 | 1 | 12.8 | 3.25 | 0.5 | 3.50 | 0.0 | 1.0 | 21.05 | 2.5 | 0.00 | 0.75 |
| 2957627 | 2 | 2025-01-03 22:20:01 | 2025-01-03 22:32:12 | NaN | 1.01 | NaN | NaN | 100 | 161 | 0 | 13.1 | 0.00 | 0.5 | 0.00 | 0.0 | 1.0 | 17.10 | NaN | NaN | 0.00 |
| 1639838 | 1 | 2025-01-18 16:10:41 | 2025-01-18 16:24:35 | 1.0 | 1.60 | 1.0 | N | 264 | 264 | 2 | 13.5 | 3.25 | 0.5 | 0.00 | 0.0 | 1.0 | 18.25 | 2.5 | 0.00 | 0.75 |
| 2532909 | 1 | 2025-01-28 09:28:09 | 2025-01-28 09:33:22 | 1.0 | 0.40 | 1.0 | N | 237 | 237 | 1 | 5.8 | 2.50 | 0.5 | 2.00 | 0.0 | 1.0 | 11.80 | 2.5 | 0.00 | 0.00 |
| 391680 | 2 | 2025-01-05 17:23:46 | 2025-01-05 17:30:01 | 1.0 | 1.40 | 1.0 | N | 249 | 68 | 1 | 8.6 | 0.00 | 0.5 | 5.00 | 0.0 | 1.0 | 18.35 | 2.5 | 0.00 | 0.75 |
| 1904355 | 2 | 2025-01-21 19:01:35 | 2025-01-21 19:11:46 | 1.0 | 1.39 | 1.0 | N | 230 | 234 | 1 | 10.7 | 2.50 | 0.5 | 3.59 | 0.0 | 1.0 | 21.54 | 2.5 | 0.00 | 0.75 |
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | Airport_fee | cbd_congestion_fee | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2408297 | 1 | 2025-01-26 16:45:21 | 2025-01-26 16:58:31 | 1.0 | 2.20 | 1.0 | N | 231 | 186 | 1 | 14.20 | 3.25 | 0.5 | 3.75 | 0.0 | 1.0 | 22.70 | 2.5 | 0.0 | 0.75 |
| 651552 | 2 | 2025-01-08 18:18:26 | 2025-01-08 18:20:27 | 1.0 | 0.37 | 1.0 | N | 141 | 263 | 1 | 4.40 | 2.50 | 0.5 | 2.00 | 0.0 | 1.0 | 12.90 | 2.5 | 0.0 | 0.00 |
| 1553371 | 2 | 2025-01-17 18:30:01 | 2025-01-17 18:44:06 | 2.0 | 1.72 | 1.0 | N | 107 | 161 | 1 | 13.50 | 2.50 | 0.5 | 2.08 | 0.0 | 1.0 | 22.83 | 2.5 | 0.0 | 0.75 |
| 1054422 | 2 | 2025-01-12 19:39:35 | 2025-01-12 19:44:38 | 2.0 | 0.84 | 1.0 | N | 113 | 79 | 1 | 6.50 | 0.00 | 0.5 | 3.38 | 0.0 | 1.0 | 14.63 | 2.5 | 0.0 | 0.75 |
| 145165 | 2 | 2025-01-02 20:53:27 | 2025-01-02 21:08:58 | 1.0 | 3.80 | 1.0 | N | 162 | 262 | 1 | 19.80 | 1.00 | 0.5 | 6.20 | 0.0 | 1.0 | 31.00 | 2.5 | 0.0 | 0.00 |
| 20399 | 2 | 2025-01-01 04:23:33 | 2025-01-01 04:27:17 | 1.0 | 0.44 | 1.0 | N | 230 | 48 | 2 | 5.80 | 1.00 | 0.5 | 0.00 | 0.0 | 1.0 | 10.80 | 2.5 | 0.0 | 0.00 |
| 966668 | 2 | 2025-01-11 20:01:42 | 2025-01-11 20:24:06 | 1.0 | 5.19 | 1.0 | N | 170 | 87 | 1 | 26.10 | 1.00 | 0.5 | 6.37 | 0.0 | 1.0 | 38.22 | 2.5 | 0.0 | 0.75 |
| 3034464 | 1 | 2025-01-11 20:24:35 | 2025-01-11 20:33:36 | NaN | 0.90 | NaN | NaN | 140 | 237 | 0 | 11.92 | 0.00 | 0.5 | 0.00 | 0.0 | 1.0 | 15.92 | NaN | NaN | 0.00 |
| 813372 | 2 | 2025-01-10 12:06:05 | 2025-01-10 12:18:25 | 1.0 | 1.50 | 1.0 | N | 90 | 68 | 1 | 12.80 | 0.00 | 0.5 | 1.00 | 0.0 | 1.0 | 18.55 | 2.5 | 0.0 | 0.75 |
| 501158 | 1 | 2025-01-07 09:01:07 | 2025-01-07 09:10:31 | 1.0 | 1.80 | 1.0 | N | 236 | 142 | 1 | 11.40 | 2.50 | 0.5 | 0.00 | 0.0 | 1.0 | 15.40 | 2.5 | 0.0 | 0.00 |